13 research outputs found

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    Segmentación de series temporales mediante un algoritmo multiobjetivo evolutivo

    Get PDF
    Premio extraordinario de Trabajo Fin de Máster curso 2015-2016. Ingeniería Informátic

    A multi-class classification model with parametrized target outputs for randomized-based feedforward neural networks.

    Get PDF
    Randomized-based Feedforward Neural Networks approach regression and classification (binary and multi-class) problems by minimizing the same optimization problem. Specifically, the model parameters are determined through the ridge regression estimator of the patterns projected in the hidden layer space (randomly generated in its neural network version) for models without direct links and the patterns projected in the hidden layer space along with the original input data for models with direct links. The targets are encoded for the multi-class classification problem according to the 1-of- encoding ( the number of classes), which implies that the model parameters are estimated to project all the patterns belonging to its corresponding class to one and the remaining to zero. This approach has several drawbacks, which motivated us to propose an alternative optimization model for the framework. In the proposed optimization model, model parameters are estimated for each class so that their patterns are projected to a reference point (also optimized during the process), whereas the remaining patterns (not belonging to that class) are projected as far away as possible from the reference point. The final problem is finally presented as a generalized eigenvalue problem. Four models are then presented: the neural network version of the algorithm and its corresponding kernel version for the neural networks models with and without direct links. In addition, the optimization model has also been implemented in randomization-based multi-layer or deep neural networks.Funding for open access charge: Universidad de Málaga / CBU

    Atherosclerotic Pre-Conditioning Affects the Paracrine Role of Circulating Angiogenic Cells Ex-Vivo

    Get PDF
    In atherosclerosis, circulating angiogenic cells (CAC), also known as early endothelial progenitor cells (eEPC), are thought to participate mainly in a paracrine fashion by promoting the recruitment of other cell populations such as late EPC, or endothelial colony-forming cells (ECFC), to the injured areas. There, ECFC replace the damaged endothelium, promoting neovascularization. However, despite their regenerative role, the number and function of EPC are severely affected under pathological conditions, being essential to further understand how these cells react to such environments in order to implement their use in regenerative cell therapies. Herein, we evaluated the effect of direct incubation ex vivo of healthy CAC with the secretome of atherosclerotic arteries. By using a quantitative proteomics approach, 194 altered proteins were identified in the secretome of pre-conditioned CAC, many of them related to inhibition of angiogenesis (e.g., endostatin, thrombospondin-1, fibulins) and cell migration. Functional assays corroborated that healthy CAC released factors enhanced ECFC angiogenesis, but, after atherosclerotic pre-conditioning, the secretome of pre-stimulated CAC negatively affected ECFC migration, as well as their ability to form tubules on a basement membrane matrix assay. Overall, we have shown here, for the first time, the effect of atherosclerotic factors over the paracrine role of CAC ex vivo. The increased release of angiogenic inhibitors by CAC in response to atherosclerotic factors induced an angiogenic switch, by blocking ECFC ability to form tubules in response to pre-conditioned CAC. Thus, we confirmed here that the angiogenic role of CAC is highly affected by the atherosclerotic environment

    Potenciando el perfil profesional Científico de Datos mediante dinámicas de competición

    No full text
    La Ciencia de Datos es el área que comprende el desarrollo de métodos científicos, procesos y sistemas para extraer conocimiento a partir de datos recopilados previamente, con el objetivo de analizar los procedimientos llevados a cabo actualmente. El perfil profesional asociado a este campo es el del Científico de Datos, el cual requiere un amplio conocimiento en estadística, matemáticas y programación, entre otros. Por tanto, el papel de los Ingenieros Informáticos como Científicos de Datos es fundamental en la sociedad, ya que las aptitudes y competencias adquiridas durante su formación se ajustan perfectamente a lo requerido en este puesto laboral. Debido a la necesidad de formación de nuevos Científicos de Datos, entre otros fines, surgen plataformas en las que éstos pueden adquirir una amplia experiencia, como es el caso de Kaggle, que ofrece un lugar de encuentro en el que generalmente empresas publican datos para ser resueltos por investigadores. El principal objetivo de esta experiencia docente es proporcionar al alumnado una experiencia práctica con un problema real, así como la posibilidad de cooperar y competir al mismo tiempo. Así, la adquisición y el desarrollo de las competencias necesarias en Ciencia de Datos se realiza en un entorno altamente motivador
    corecore